Comments on : “ Model Complexity Control for Regression Using VC
نویسندگان
چکیده
In [1], various model selection approaches were experimentally inter-compared; one of the considered model selection criteria was the Schwarz Information Criterion (SIC); however, SIC was incorrectly implemented. The same mistake has been repeated in other more recent papers. Here, we show why the SIC formula originally employed was wrong. We report instead the correct approach, which is well-known in statistics literature. We then show that the SIC performance is far better than the one described in [1], by repeating several experiments of the original paper. Nevertheless, we confirm that VC-based model selection is more powerful than SIC, especially for small samples.
منابع مشابه
High-dimensional classification by sparse logistic regression
We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic bounds for the resulting misclassification excess risk. The bounds can be reduced under the additional low-noise condition. The proposed complexity penalty ...
متن کاملComparison of Model Selection for Regression
We discuss empirical comparison of analytical methods for model selection. Currently, there is no consensus on the best method for finite-sample estimation problems, even for the simple case of linear estimators. This article presents empirical comparisons between classical statistical methods - Akaike information criterion (AIC) and Bayesian information criterion (BIC) - and the structural ris...
متن کاملMeasuring The VC - dimension Using OptimizedExperimental
VC-dimension is the measure of model complexity (capacity) used in VC-theory. The knowledge of the VC-dimension of an estima-2 tor is necessary for rigorous complexity control using analytic VC generalization bounds. Unfortunately, it is not possible to obtain the analytic estimates of the VC-dimension in most cases. Hence, it has been recently proposed to measure the VC-dimension of an estimat...
متن کاملInvestigating the relationship among complexity, range, and strength of grammatical knowledge of EFL students
Assessment of grammatical knowledge is a rather neglected area of research in the field with many open questions (Purpura, 2004). The present research incorporates recent proposals about the nature of grammatical development to create a framework consisting of dimensions of complexity, range and strength, and studies which dimension(s) can best predict the stat...
متن کاملPenalty Functions for Genetic Programming Algorithms
Very often symbolic regression, as addressed in Genetic Programming (GP), is equivalent to approximate interpolation. This means that, in general, GP algorithms try to fit the sample as better as possible but no notion of generalization error is considered. As a consequence, overfitting, code-bloat and noisy data are problems which are not satisfactorily solved under this approach. Motivated by...
متن کامل